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Two  techniques  generally  advocated  for  the  deletion  of  principal  components 
in  regression  analysis  are:  jjfti  delete  components  associated  with  small 
latent  roots  X'X/  an<3iUe±7  delete  components  following  nonrcjection  of  a 
statistical  tes|^o^,,^fie  significemce  of  the  components.  The  estimator 
corresponding  to, procedure  is  refer^^ed^Jo^^^s  a restricted  least  squares 

estimator  and  that  associated  with  called  a preliminary  test  esti- 

mator. Properties  of  these  estimators  are  examined  in  this  paper  with 
special  attention  to  the  effects  of  multicollinearities  on  the  preliminary 
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DEI£TION  OF  PRINCIPAL  COMPONENTS  IN  REGRESSION 

R.  F.  Gunst  and  J.  T.  Webster 
Department  of  Statistics 
Southern  Methodist  University 
Dallas,  Texas  75275,  U.  S.  A. 

SUMMARY 

Two  techniques  generally  advocated  for  the  deletion  of  principal  com- 
ponents in  regression  analysis  are:  (i)  delete  coirponents  associated  with 
small  latent  roots  of  x'x,  and  (li)  delete  conponents  following  nonrejection 
of  a statistical  test  of  the  significance  of  the  consonants.  The  estimator 
corresponding  to  procedure  (i)  is  referred  to  as  a restricted  least  squares 
estimator  and  that  associated  with  (ii)  is  called  a preliminary  test  esti- 
mator. Properties  of  these  estimators  are  examined  in  this  paper  with 
special  attention  to  the  effects  of  multicollinearities  on  the  preliminary 
test  estimator.  The  restricted  estimator  is  recommended  for  use  unless  in- 
ferences on  the  noncentrality  parameter  of  the  preliminary  test  clearly 
indicate  that  the  test  will  have  adequate  power. 

1.  INTRODUCTION 

Principal  Component  Regression  (FOR)  has  long  been  eitployed  in  con- 
junction with  tests  of  hypotheses  of  the  components.  This  is  advocated  as 
a means  of  determining  whether  the  components  have  predictive  value.  Massy 
(1965),  Lott  (1973),  euid  others  have  proposed  such  tests.  Recently,  in  the 
comparison  of  n«an  squared  error  properties  of  biased  regression  estimators, 
the  PCR  estimator  has  been  used  almost  exclusively  following  a preliminary 
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test  of  the  predictivity  of  the  conponents.  In  the  simulation  comparisons  of 
Lawless  and  Wang  (1976)  and  Dempster,  Schatzoff,  and  Wermuth  (1977) , for  ex- 
ample, the  PCR  estimators  investigated  all  used  preliminary  tests.  The  uni- 
formly poor  performance  of  these  PCR  estimators  led  the  authors  to  conclude 
that  other  biased  estimators  (notably  ridge  regression)  should  be  preferred 
to  the  PCR  estimator,  particularly  when  the  predictor  variables  are  multi- 
collinear . 

Gunst  and  Mason  (1977) , on  the  other  hand,  report  that  a PCR  estimator 
for  which  the  predictivity  of  the  components  was  assessed  solely  on  the  basis 
of  whether  the  component  was  associated  with  a strong  multicollinearity 
among  the  predictor  variables  outperformed  the  other  biased  estimators 
(including  one  ridge-type)  with  which  it  was  compared.  The  authors  suggested 
that  the  discrepancy  in  the  performance  of  the  PCR  estimator  in  this  and 
the  previous  investigations  might  be  attributed  to  the  instability  of  F 
statistics  typically  used  in  assessing  the  predictive  merit  of  the  components, 
an  instability  pointed  out  by  Mansfield  (1975) . 

This  article  addresses  the  question  of  whether  a preliminary  test  of  the 
predictivity  of  components  in  PCR  should  be  attempted  using  the  standard  F 
statistics  generally  advocated  for  that  purpose.  The  work  of  Bock,  Yancy, 
and  Judge  (1973)  on  preliminary  test  estimation  is  central  to  this  discussion 
since  it  provides  the  mean  squared  error  (risk  function)  for  the  PCR  estimator 
based  on  preliminary  tests  of  the  predictivity  of  the  components. 

2.  PRELIMINARY  TEST  ESTIMATION 

We  employ  the  standardized  linear  regression  model 


Y = + XB  + 

— o—  — ~ 


(2.1) 


where  Y is  an  (nxl)  vector  of  observable  response  variables,  1 is  an  (nxl) 


vector  of  ones,  X = tX,,  full-colvunn-rank  matrix 

of  nonstochastic  predictor  variables  that  are  stand£u:dized  so  that  I'X  - 0 

"i 

I 

and  XjX^  = 1 for  j=l,  2,  ...»  p,  0^  and  £'  = (8^^,  8^)  are  unknovm 

constants,  and  £ is  an  (nxl)  vector  of  unobserv£d}le  random  error  terms  with 
2 

e^''<4ID(0,a  ).  Denote  the  latent  roots  and  corresponding  latent  vectors  of 

x'x  by  *"2^  i ^^2  — •••  — ^p  —1'  —2'  respectively.  Also,  let 

L =diag  (£  , £ , £ ) and  V = [V,,  V^,  ...,  V ]. 

i.  * p —1  — 2.  — p 

The  least  squares  estimator  of  ^ is 
* -1  ^ -1 

(X'X)  X'Y  = E C.V.,  (2.2) 

j=l  3 D 3 

I • 

where  c^  = V^X  Y.  One  measure  of  the  adequacy  of  (2.2)  as  an  estimator  of  ^ is 
the  (total)  mean  squared  error  of  — — 

mse(6.)  = £{(6.-8)'  (6-6.)} 

2 -1  2^  -1 

= a tr(X'X)  = a I 1.  , (2.3) 

j=l  ^ 

where  tr(A)  denotes  the  trace  of  the  matrix  A.  A drawback  of  least  squares 

estimation  of  ^ is  that  when  the  predictor  variables  are  multicollinear  (2.3) 

can  be  extremely  large.  Multicollinearities  are  characterized  as  linear 

combinations  of  the  columns  of  X that  are  nearly  zero  (see,  e.g.  Mason  et  al. 

(1975)).  If  the  columns  of  X are  multicollinear,  one  or  more  of  the  latent 

roots  of  X'X  are  very  close  to  zero,  resulting  in  one  or  more  terms  of  (2.3) 
being  extremely  large. 

Biased  estimators  of  ^ have  been  developed  with  the  intention  of  reducing 
the  mean  squared  errors  of  the  resulting  estimators.  Since  the  magnitude  of 

A 

mse(^)  is  for  the  most  part  controlled  by  the  last  few  terms  of  (2.3)  (i.e. 
by  the  latent  roots  of  X' X that  are  closest  to  zero) , one  strategy  for  developing 


a biased  estimator  of  ^ is  to  construct  one  that  eliminates  terms  in  (2.2) 
corresponding  to  small  values  of  The  resulting  estimator  is  a principal 

component  estimator  of 

To  derive  a PCR  estimator  of  rewrite  (2.1)  as 

Y = BqI  + Wx  + £,  (2.4) 

where  W = XV,  w = [W, , W , . . . , W ] with  W.  = XV. , y = V*  3,  and  y'=  (Y, fY^f  . • • / 

— i —2  — p —j  ~ ~ 12 

Yp)  with  Yj  = Note  that  the  least  squares  estimator  of  £ is 

" -1 

Y = L W’Y; 

“ -1  -12 

i.e.,  Y-=  WIY  and  y •'''NID(y . / i . a ).  The  W.  in  (2.4)  are  referred  to  as 
3 J -J—  J J J -T) 

the  principal  components  of  X.  PCR  deletes  some  of  the  components  from  (2.4) 
and  estimates  the  coefficients  of  the  remaining  components  by  least  squares. 

Let  X denote  an  estimator  of  x which  some  coefficients  have  been  set 
to  zero  (note  that  this  equivalent  to  deleting  the  corresponding  conponents 
from  (2.4))  and  the  remaining  ones  estimated  by  least  squares.  The  associated 
PCR  estimator  of  j3  is  then  £ = V^. 

Basically  two  procedures  have  been  recommended  in  the  literature  for 
selecting  components  to  delete  in  PCR.  Massy  (1965)  summarizes  these  as; 

(i)  delete  components  associated  with  the  smallest  latent  roots  of  X'X,  or 

(ii)  delete  components  that  are  unimportant  as  predictors  of  the  response 
variable. 

Suppose  first  that  the  components  associated  with  the  s smallest  latent 
roots  of  X’X  are  to  be  deleted  from  (2.4).  The  PCR  estimator  of  x becomes 

AAA  A 

X'  * (Yj^f  Y2'  •••»  Yp_gf  0,  0,  ...,  0), 

and  the  PCR  estimator  of  3 is 


(2.5) 


6 . VI  . 


where  V = [V,  :V  ] , V, 

L S 1j 


[V  ,v  , V ^1,  V « 

—1  — p-s  s 


[V  V V 1» 

~P“S+1  ~p-s+2, 


and  v!  » (y, » •••»  Y )• 

1 2 p-s 

An  equivalent  procedure  for  deriving  the  PCR  estimator  (2,5)  is  to  estimate 
^ by  least  squares  subject  to  the  restriction  Thxis,  this  PCR  estimator 

is  a restricted  least  squares  estimator.  It  is  important  to  note  that  the  re- 
strictions are  determined  solely  by  an  examination  of  X’X  and  its  latent  roots 
and  latent  vectors  and  not  as  a result  of  inferences  made  using  the  response 
variable.  Hence  the  restrictions  are  nonstochastic. 

The  me  cm  squared  error  of  ^ is 


mse(|.)  = E{  (^-^)  • (S-0^)} 

= i~}  + B’V  V'B.  (2,6) 

j=l  j — s s~ 

Comparison  of  (2.3)  and  (2.6)  reveals  that  the  restricted  least  squares  estimator 
of  ^ does  indeed  eliminate  the  largest  terms  of  (2.3)  but  at  the  cost  of  intro- 

P 2 

ducing  a term  due  to  bias:  B'V  V'6=  Z (V'B)  . If  the  restrictions  V'B=0 

— s s—  . s 

:=p-s+l 

are  true  there  is  no  bias  term  in  (2.6) ; otherwise,  the  bias  is  nonzero  and 
could  potentially  be  larger  than  the  terms  eliminated  from  (2.3)  by  imposing 
the  restrictions. 

Next  suppose  Massy's  (1965)  second  recommendation  is  adopted.  In  par- 
ticular, suppose  one  wishes  to  delete  the  jth  principal  component  if  a 
test  of  the  ‘’vpothesis  Yj'V^^O  is  not  rejected.  For  the  moment,  consider 
testing  jointly  Yg*V^**0  and  using  the  least  squares  estimator  of  (2.2), 
if  this  hypothesis  is  rejected.  If  the  hypothesis  is  not  rejected,  the 
restricted  least  squares  estimator  (2.5)  is  employed.  This  estimator, 


referred  to  as  a preliminary  test  estimator,  is  also  a PCR  estimator. 


Often  the  hypothesis  V^B=0  is  tested  by  calculating 

F„  = B'V  (V(X'X)"^V  )~^V'B/sMSE, 

H S S S S"* 


(2.7) 


vrtiere  £ is  the  least  squares  estimator  of  (2.2),  and  MSE  is  the  unbiased 

2 -1  -1 

least  squares  estimator  of  a , MSE  = Y'{l  - n ^ 1.'  - X(X'X)  X' }Y/(n-p-l)  . 

If,  for  a preselected  value  of  a,  F > c,  where 

H 

o = Pr{F  > c}  (2.8) 


and  F is  a central  F random  varicd^le  with  degrees  of  freedom  s and  (n-p-1) , 
the  hypothesis  V'B  = £ is  rejected  and  B is  used  to  estimate  B.  If  F < c, 

£ is  the  estimator  of  £. 

Bock  et  al.  (1973)  concisely  represent  the  preliminary  test  estimator  as 


S--'io,c) 'Vi  *'(=,-) ‘Vi’ 


<2. 9) 


where  jjj  = 1 if  a £ u < b and  equals  0 otherwise.  From  eg.  (3.7)  of 
Bock  et  al.  (1973),  in  the  notation  of  this  paper, 

mse(B')  = E{(B-B)  ' (B-B)} 


= a 


E (A)  E +(2p  {A)-p,(A))B’V  V’B,  (2.10) 

J,  1 2 — S S — 

]=1  3=p-s+l 


where 

Pj(X)  = Pr{F' (s,n-p-l.  A)  < cs/(s’f2j)},  (2.11) 

and  F* (s, n-p-1.  A)  is  a noncentral  P random  variable  with  s and  (n-p-1)  degrees 
of  freedom  and  noncentrality  parameter  A. 

In  the  remaining  sections  of  this  paper  we  wish  to  compare  the  least 

A am 

squares  (£) , restricted  least  squares  (B.) , and  preliminary  test  (B.)  estimators 
of  £ with  specific  attention  focused  on  multicollinear  predictor  variables. 

By  examining  the  characteristics  of  preliminary  test  estimator  in  particular. 


7 


the  apparent  conflicts  between  the  two  PCR  estimators  maybe  more  clearly 
understood. 


3.  POWER 

An  appreciation  of  the  effects  of  multicollinearities  on  the  power  of  a 

preliminary  test  is  important  to  the  consideration  of  the  relative  merits  of 

the  three  estimators  defined  in  the  previous  section.  For  sinplicity  let  us 

consider  a test  of  H : V'B  = 0 vs  H ; V'B  0,  where  V is  the  latent  vector 

o — p—  a — p—  -p 


of  X'X  corresponding  to  the  smallest  latent  root,  In  performing  this 

test  we  are  attempting  to  determine  whether  the  pth  principal  coirponent  of 

X is  in^rtant  in  predicting  the  response  variable,  since  y = V 6.  If  F 

p -p  - H 

(eg.  (2.7))  is  used  as  the  test  statistic,  resulting  in  the  uniformly  most 

powerful  test  of  H : y =0,  the  noncentrality  parameter  of  F is 
op  H 


X = £ (V  6)^20^. 

p -p  — 


(3.1) 


As  a function  of  observe  that  X (and  hence  the  power  of  the  test) 

decreases  as  the  multicollinearity  indicated  by  V becomes  stronger  (for 
2 

fixed  Vp,  and  a ) since  is  approaching  zero  in  (3.1)  . So  the  stronger 
the  multicollinearity,  the  less  the  power  of  the  test;  yet  F is  often  pro- 
posed  for  use  when  the  predictor  variables  are  extremely  multicollinear . 
Qualitatively,  this  statistic  appears  to  be  a poor  choice  for  assessing  the 
predictivity  of  the  components. 

To  illustrate  the  dramatic  effects  of  multicollinearity  on  the  power 

of  the  test  quantitatively,  Figxare  1 exhibits  power  curves  associated  with 

2 2 

Fjj  as  a function  of  = (Vp'  £)  /20  for  a regression  model  with  error 

degrees  of  freedom  v = n-p-l=10  and  two  selections  of  I :1.0  and  0.01. 

P 

Since  X is  assumed  standardized,  implies  that  X is  an  orthogonal 

matrix  and  no  multicollinearities  exist  in  the  data. 


(Insert  Figure  1] 


For  fixed  X/Jl  (recall,  X/)l„  = (V*  B)^/20^),  Figure  1 reveals  that  the 
P P ^ “ 

power  drops  precipitously  when  J,  decreases  from  1.0  to  .01.  Thus  with 

P 

strongly  multicollinear  data  there  is  a much  greater  likelihood  that  ^ *0 

will  not  be  rejected  than  if  the  predictor  variables  were  not  multicollinear. 

Ibis  is  especially  true  for  small  a - levels  and  the  results  generalize  to 
simultaneous  tests  of  the  hypotheses  Hq:V^  ^ 

In  a more  general  setting  than  that  considered  in  this  paper,  Toro  - 
Vizcarrondo  and  Wallace  (1968)  studied  preliminary  testing  from  a slightly 
different  viewpoint.  Denote  a least  squares  estimator  of  £ subject  to  the 

A 

general  restrictions  K£  ■ m by  Toro  - Vizcarrondo  and  Wallace  (1968) 

A A 

define  ^ to  be  "better"  than  the  least  squares  estimator  ^ if  for  every 
vector 

A A 

mse(d'  ^)  £ mse  (d*  £)  . (3.2) 

iTanslating  tnis  cxscussion  to  the  problem  b'-'iig  investigated  in  this  paper, 

^ (the  principal  component  estimator  with  0)  is  "better"  than  ^ if  for 
every  d (3.2)  holds  (replacing  B„  with  B) . Toro-Vizceurrondo  and  Wallace  also 
showed  that 

mse  (d*  £)  ^ mse(d'  B_)  for  all  ^ <=>  X ^ 1/2, 

where  X is  the  noncentrality  parameter  of  F (eg.  (2.7).  They  then  proved 

n 

that  a uniformly  most  powerful  test  of  HotX  ^ 1/2  vs  Ha:X  > 1/2  is  to  reject 
H_  If  F is  greater  than  the  upper  100 (!-«)%  critical  point  of  a noncentral  F 
distribution  with  noncentrality  parameter  1/2,  i.e.  reject  Hq  if  > F^(l,v;X»>j) 
and  do  not  reject  otherwise. 

Since  the  noncentrality  parameter  of  this  test  is  identical  to  (3.1), 

the  effects  of  multicollinear  predictor  variables  will  be  the  same  as  those  of 

the  more  traditional  test  of  comparing  F with  a central  F critical  point. 

n 


Figure  2 displays  power  curves  for  the  test  proposed  by  Toro  - Vizcarrondo 
2md  Wallace  for  the  same  model  parcimeters  as  in  Figure  1.  The  power  curves 
are  slightly  lower  in  Figure  2 than  those  in  Figure  1 and  the  debilitating  in- 
fluence of  strong  multicollinearities  on  the  power  is  again  clearly  evident. 

[Insert  Figure  2] 

One  conclusion  that  is  readily  apparent  from  Figures  1 and  2 is  that 

X/Hp  = jB) ^/2a^  must  be  very  large  before  reasonable  power  will  be  achieved 

with  either  of  the  eibove  tests  when  the  data  contains  strong  multicollinearities. 

Thus  when  the  predictor  variables  are  strongly  multicollinear , the  preliminary 

test  estimator  will  tend  to  reduce  to  the  restricted  least  squares  estimator 

unless  X/H  is  very  large;  however,  even  when  X/£  is  small  or  moderate  in 
P P 

I 

magnitude,  the  preliminary  test  rejects  HotV  0=0  frequently  enough  to 

VJ  _p  _ 

make  it  more  advantageous  to  use  the  restricted  least  squares  estimator  for 

these  values  of  X/£  , as  we  shall  now  see. 

P 


4.  MEAN  SQUARED  ERRORS 

The  mean  squared  errors  of  the  least  squares  (LS) , restricted  least 
squares  (R) , and  preliminary  test  (PT)  estimators  were  given  in  equations  (2.3), 
(2.6),  and  (2.10),  respectively.  Graphs  of  the  mean  squared  errors  as  a 
function  of  X/Jl^  are  exhibited  in  Figures  3 and  4.  In  computing  the  mean 
squared  errors,  three  p=5,  n-p-l=10  X matrices  were  studied,  each  defined 
by  the  following  sets  of  latent  roots; 

(i)  = £2  ~ ^2  ~ ^4  ~ ^5  ~ (orthogonal  X matrix) 

(ii)  £j^  = 2.90,  = 1.00,  = 0.70,  £^  = 0.30,  £^  = 0.10  (moderate 

multicoll inear ity) 

(iii)  £j^  =2.99,  * 1*00,  =0.70,  £^  = 0.30,  £^  = 0.01  (strong  multi- 

collinearity) . 

2 

In  all  cases  o =1.0  was  used. 


Figure  3 displays  the  mean  squared  error  curves  for  models  (i)  and  (iii) 

over  the  same  range  of  X/H  as  in  Figures  1 and  2.  Consider  first  the  ortho- 

P 

gonal  data,  the  lower  three  curves  in  Figure  3.  Unless  is  very  small 

(less  than  0.5  for  restricted  least  squares  and  about  0.4  for  the  preliminary 
test  estimator) , least  squares  has  a smaller  mecin  squared  error  than  either 
principal  component  estimator,  although  the  difference  in  mean  squared  errors 
is  never  very  great  between  LS  and  PT.  If  X/2,^  is  not  extremely  small,  more- 
over, the  restricted  least  squares  estimator  is  clearly  inferior  to  the  least 
squares  and  preliminary  test  estimators.  But  our  major  concern  here  is  with 
multicollinear  data. 

[Insert  Figure  3] 

The  upper  three  curves  of  Figure  3 display  the  mean  squared  errors  for 
the  strong  multicollinearity  (Z^  = .01) . The  relationships  among  the  mean 
squared  errors  is  completely  changed  from  the  orthogonal  data.  Over  thb 
entire  range  of  X/S,^  in  Figure  3j  the  restricted  least  squares  estimator  has  a 
substantially  smaller  mean  squared  error  than  least  squares  and  the  preliminary 
test  estimator.  Further,  the  mean  squared  error  for  PT  is  much  smaller  than 
that  for  least  squares.  From  Figures  1 and  2 one  can  again  observe  that  the 
power  of  the  test  of  either  X=0  (Figure  1 )or  X^l/2  (Figure  2)  is  low,  account- 
ing for  the  smaller  mean  squared  error  for  the  preliminary  test  estimator  than 
least  squares.  Yet  the  hypotheses  are  rejected  frequently  enough  to  force  the 
mean  squared  error  for  the  preliminary  test  estimator  to  be  much  larger  than 

that  of  the  restricted  least  squares  estimator  over  this  range  of  X/Z  . In 

P 

fact  X/S,p  must  be  quite  large  before  the  mean  squared  error  of  the  restricted 
least  squares  estimator  will  exceed  that  of  least  squares  or  the  preliminary 
test  estimator.  This  is  illustrated  in  Figure  4. 


[Insert  Figure  4] 


IT 


Model  configurations  (ii)  and  (ill)  are  plotted  in  Figure  4 (note  that 
both  curves  for  the  restricted  estimator  are  so  close  to  one  another  that 
only  one  has  been  graphed) . The  effects  of  the  strength  of  the  multicollinearities 
can  be  appreciated  by  comparing  the  curves  as  the  multlcollinearity  is  strength-  j 

'I 

ened  (e.g.  as  changes  form  .10  to  .01).  In  general,  the  stronger  the 
multlcollinearity  the  wider  the  range  of  X/i  for  which  the  restricted  least 

P 

squares  estimator  is  superior  (in  a mean  squared  error  sense)  to  the  other 

:] 

two  estimators.  Conversely,  if  is  not  small  the  restricted  least  squares 
estimator  can  be  greatly  inferior  to  the  least  squares  and  preliminary  test 

'j 

estimators . j 

With  strongly  multicollinecir  data,  therefore,  the  restricted  least  | 

n 

squares  estimator  - the  principal  component  estimator  with  conponents  associated  ! 

with  small  latent  roots  deleted  - is  preferable  to  least  squares  or  a pre-  i 

liminary  test  estimator  unless  \/l  is  extremely  large.  From  Figures  1 and 

p ; 

2 note  that  large  values  of  X/i^  are  also  required  to  insure  adequate  power 
for  the  test  of  predictivity  of  multicollinearities.  Unfortunately,  X/i 

P 

is  un]cnown.  Whether  sample  data  can  provide  useful  information  on  the  magnitude 
of  X/A^  is  the  subject  of  the  next  section. 

5.  PRACTICAL  CONSIDERATIONS 

Ideally,  we  should  not  use  the  data  to  indicate  which  regression  estimator 
should  be  enployed  in  a specific  analysis.  Since  using  a preliminary  test  to  ■ 

decide  whether  to  eirploy  least  squares  or  a restricted  least  squares  estimator 

actually  produces  an  estimator  that  is  a mixture  of  the  two,  so  too  if  one  ; 

must  estimate  \/l.  before  deciding  whether  to  employ  LS,  R,  or  PT,  the  actual  i 

3 :| 

estimator  utilized  is  also  a mixture  of  LS  and  R.  So  what  is  one  to  do?  | 

In  the  absence  of  exact  knowledge  regarding  the  value  of  X/£ ^ , we  still  | 

prefer  to  allow  the  data  to  suggest  an  appropriate  estimator  despite  the  j 


^1^ 


mixture  problems,  but  only  to  a limited  extent.  If  inferences  on  clearly 

suggest  that  the  power  of  the  preliminary  test  is  good,  we  employ  the  pre- 
liminary test  estimator.  Otherwise,  we  prefer  the  restricted  least  squares 
estimator.  An  exaunple  will  illustrate  our  application  of  the  results  of 
Sectiins  3 emd  4. 

The  pitprop  data  of  Jeffers  (1967)  concerns  the  construction  of  a pre- 
diction equation  for  the  maximum  compressive  strength  of  timbers  from  infor- 
mation provided  by  thirteen  measurements  on  each  timber  (such  as  length, 
diameter,  and  specific  gravity) . The  data  collected  on  180  timbers  indicates 
that  there  are  three  strong  multicollinearities  among  the  predictor  variables 
as  evidenced  by  three  small  latent  roots  of  X’X;  .0387,  .0414,  and  .0506 
(the  next  smallest  root  is  .1908) . 

2 

Scaled  mean  squared  errors  (mse/o  ) of  the  least  squares,  restricted 
least  squares,  and  preliminary  test  estimators  are  given  in  Figure  5.  For 
simplicity  the  scaled  mean  squared  errors  of  the  two  principal  component 
estimators  are  drawn  as  though  a single  small  latent  root  of  magnitude  .05 
is  being  deleted  (all  three  small  latent  roots  of  the  pitprop  data  are  close 
to  .05) . We  are  going  to  examine  the  deletion  of  each  of  the  three  components 
separately  although  we  could  consider  them  two  at  a time  or  all  three  simul- 
taneously with  only  minor  modifications  of  the  following  argimients. 

[Insert  Figure  5] 

As  mentioned  earlier,  F„  is  the  uniformly  most  powerful  test  of  V.  6 = 0. 

H ~3  “ 

The  previous  sections  have  shown  that  it  is  also  effective  in  selecting  whether 
to  use  least  squares  or  restricted  least  squares  provided  that  the  noncentrality 
parameter  is  not  too  small.  Rather  than  blindly  using  the  statistic,  we 
advocate  using  the  data  to  obtain  information  on  the  noncentrality  parameter. 

Point  estimates  of  X./i.,  j*ll,  12,  and  13,  using  X,/i.  = (V.  8)^20^ 

J j J J “3  ~ 

2 

with  least  squares  estimates  of  B and  a , are 


TT 


h 


“ 4,13»  ^12^^12  and  ^13^^13  * 0.001. 

Unfortunately,  the  estimators  used  to  obtain  these  values  have  extremely  large 
variances  due  to  the  small  magnitudes  of  the  latent  roots  (cf.  Silvey  (1969)). 
Thus  the  point  estimates  cannot  be  trusted  and  interval  estimates  will  tend 
to  be  wide.  The  interval  estimates  nevertheless  provide  interesting  impli- 
cations for  this  data  set. 

Two-sided  95%  confidence  intervals  were  computed  for  j«ll-13, 

using  the  noncentral  F distribution  associated  with  F . For  X /f  the 

H XI  XX 

confidence  interval  is 

0 1 ^ 1 06.67  . 

As  expected,  this  interval  is  quite  wide  and  includes  not  only  the  range  of 

\/l  included  in  Figure  5 but  it  also  extends  beyond  that  range.  For  the  reasons 

expressed  in  the  two  previous  sections  for  being  wary  of  the  preliminary  test 

estimator,  we  prefer  to  use  the  restricted  estimator  when  there  is  doubt  as 

to  which  is  more  appropriate.  So  we  advocate  using  the  restricted  estimator 

to  delete  the  principal  component  associated  with  The  choice  of  an 

estimator  is  more  clear  cut  for  the  other  two  conponents. 

The  95%  confidence  interval  for  X,V£,  „ is 

12  12 

125.41  _< 

Again  the  confidence  interval  is  wide;  however,  the  lower  bound  is  about  an 

order  of  magnitude  larger  than  the  preferential  range  for  the  restricted 

estimator  in  Figure  5.  The  power  of  F is  good  over  this  interval  (greater 

n 

than  .8)  so  the  preliminary  test  is  recommended.  With  F =27.80,  the  pre- 
liminary  test  indicates  that  the  con^nent  associated  with  should  be 
retained  in  the  estimator. 

Finally,  consider  a confidence  interval  for  actual  value 

of  the  point  estimate  of  this  parameter  is  0.000986.  Using  the  central  F 


J 


If 


distribution  associated  with  F , the  observed  value  of  F„  falls  in  the  lower 
1%  of  the  population  for  X°0.  Thus  a 95%  two-sided  confidence  interval  for 

cannot  be  calculated.  We  can,  therefore,  conclude  with  a high  degree 
of  certitude  that  the  restricted  least  squares  estimator  should  be  employed 
to  delete  this  component. 

In  summary,  confidence  intervals  on  clearly  indicate  that  the 

preliminary  test  and  restricted  least  squares  estimators  should  be  utilized 
on  the  last  two  principal  components,  respectively.  Concern  over  the  re- 
duced power  of  the  preliminary  test  estimator  when  testing  components 
associated  with  multicollinearities  lead  us  to  recommend  en^loying  the 
restricted  estimator  on  the  principal  con^nent  associated  with  The 

conclusion  is  that  the  components  associated  with  and  should  be 
deleted  but  the  preliminary  test  forces  us  to  retain  the  component 
corresponding  to 

This  use  of  confidence  intervals  to  obtain  information  on  X/t^  is  actually 
equivalent  to  simply  performing  a prelimin2u:y  test  with  a very  small  signifi- 
cance level.  By  insisting  that  the  confidence  intervals  provide  relative 
certitude  that  X/l^  is  large  enough  to  enable  the  preliminMy  test  to  have 

adequate  power,  we  are  in  effect  insisting  that  F be  extremely  far  out  in 

n 

the  upper  tail  of  the  associated  central  F distribution.  What  we  have 
attempted  to  convey  through  this  exanple,  however,  is  our  rationale  for  de- 
manding such  a small  a level.  The  previous  sections  have  presented  the  basis 
for  our  recoiranendation  that  unless  one  is  relatively  sure  that  X/Z^  is  not 
small  the  restricted  least  squares  estimator  should  be  preferred  to  the  pre- 
liminary test  estimator.  The  simulations  referred  to  in  the  Introduction  also 


attest  to  this  recommendation. 


6.  CONCLUSION 


Motivated  by  discrepancies  in  conclusions  drawn  from  simulations  com- 
paring the  performance  of  regression  estimators,  this  paper  has  examined 
characteristics  of  the  two  roost  widely  recommended  principal  con^nent  re- 
gression estimators.  The  power  of  the  preliminary  test  was  shown  to  be 
severely  reduced  by  multicolline^u^ities  among  the  predictor  variables,  yet 
the  test  is  proposed  to  ascertain  whether  these  scune  multicolline2urities 
have  predictive  value.  If  the  noncentrality  parameter  is  not  sufficiently 
large  to  dominate  the  small  latent  root  associated  with  the  multicollinearity, 
not  only  is  the  power  poor  but  the  mean  squared  error  of  the  preliminary 
test  estimator  is  also  larger  than  the  restricted  least  squares  estimator. 

The  deliterious  effects  of  multicollineeurities  on  the  preliminary  test 
estimator  lead  one  to  infer  that  it  should  not  routinely  be  used.  Our  pre- 
ference is  to  use  the  restricted  estimator  unless  the  data  yields  clear 
evidence  that  the  power  of  the  preliminary  test  will  be  sufficiently  large 
to  render  good  inferences. 
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FIGURE  CAPTIONS 


Figure  1.  Power  Curves  For  Testing  (v>10) . 

Figure  2.  Power  Curves  For  Testing  (v-10) . 

Figure  3.  Mean  Squeured  Errors  For  Least  Squares  (LS) , Preliminary  Test 
Test  (PT)(  And  Restricted  Least  Squares  (R)  Estimators. 

Figure  4.  Mean  Squared  Errors  For  Least  Squares  (LS) , Preliminary  Test 
(PT),  And  Restricted  Least  Squares*  (R)  Estimators. 

Figure  5.  Scaled  Mean  Squared  Errors,  Pitprop  Data  With  One  Conponent 
(Ib.OS)  Deleted. 
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